StatSpace: A Unified Platform for Statistical Data Exploration
نویسندگان
چکیده
In recent years, the amount of statistical data available on the web has been growing fast. Numerous organizations and governments publish data sets in a multitude of formats and encodings, using different scales, and providing access through a wide range of mechanisms. Due to such inconsistent publishing practices, integrated analysis of statistical data is challenging. StatSpace tackles this problem through semantic integration and provides uniform access to disparate statistical data. At present, it incorporates more than 1,800 data sets published by a variety of data providers including the World Bank, the European Union, and the European Environment Agency. StatSpace transparently lifts data from raw sources, maps geographical and temporal dimensions, aligns value ranges, and allows users to explore and integrate the previously isolated data sets. This paper introduces the constituent elements of the StatSpace architecture – i.e., a metadata repository, URI design patterns, and supporting services – and demonstrates the usefulness of the resulting Linked Data infrastructure by means of use case examples.
منابع مشابه
An EM Algorithm for Estimating the Parameters of the Generalized Exponential Distribution under Unified Hybrid Censored Data
The unified hybrid censoring is a mixture of generalized Type-I and Type-II hybrid censoring schemes. This article presents the statistical inferences on Generalized Exponential Distribution parameters when the data are obtained from the unified hybrid censoring scheme. It is observed that the maximum likelihood estimators can not be derived in closed form. The EM algorithm for computing the ma...
متن کاملKNIME for reproducible cross-domain analysis of life science data.
Experiments in the life sciences often involve tools from a variety of domains such as mass spectrometry, next generation sequencing, or image processing. Passing the data between those tools often involves complex scripts for controlling data flow, data transformation, and statistical analysis. Such scripts are not only prone to be platform dependent, they also tend to grow as the experiment p...
متن کاملModel Selection Based on Tracking Interval Under Unified Hybrid Censored Samples
The aim of statistical modeling is to identify the model that most closely approximates the underlying process. Akaike information criterion (AIC) is commonly used for model selection but the precise value of AIC has no direct interpretation. In this paper we use a normalization of a difference of Akaike criteria in comparing between the two rival models under unified hybrid cens...
متن کاملExpoDB: An Exploratory Data Science Platform
We are entering a new data era: on the one hand, we are witnessing an unprecedented explosion of data volume and variety; and on the other hand, the data is now becoming increasingly interconnected yet disconnected. To derive insights from data, there is a pressing need to knit together a data model that is naturally heterogeneous while deeply interconnected. To construct a unified view of data...
متن کاملFactors Affecting Usage of Computer-Assisted Audit Techniques with Emphasizing Auditor’s Characteristics: Unified Theory of Acceptance and Use of Technology
The aim of this study is to design a model of factors affecting on usage of computer-assisted audit techniques based on the unified theory of acceptance and use of technology with considering auditor’s characteristics. This research is descriptive-survey in terms of data collection with an applied purpose. Statistical population includes auditors employed in audit firms that 311 auditors are s...
متن کامل